4 research outputs found
FDive: Learning Relevance Models using Pattern-based Similarity Measures
The detection of interesting patterns in large high-dimensional datasets is
difficult because of their dimensionality and pattern complexity. Therefore,
analysts require automated support for the extraction of relevant patterns. In
this paper, we present FDive, a visual active learning system that helps to
create visually explorable relevance models, assisted by learning a
pattern-based similarity. We use a small set of user-provided labels to rank
similarity measures, consisting of feature descriptor and distance function
combinations, by their ability to distinguish relevant from irrelevant data.
Based on the best-ranked similarity measure, the system calculates an
interactive Self-Organizing Map-based relevance model, which classifies data
according to the cluster affiliation. It also automatically prompts further
relevance feedback to improve its accuracy. Uncertain areas, especially near
the decision boundaries, are highlighted and can be refined by the user. We
evaluate our approach by comparison to state-of-the-art feature selection
techniques and demonstrate the usefulness of our approach by a case study
classifying electron microscopy images of brain cells. The results show that
FDive enhances both the quality and understanding of relevance models and can
thus lead to new insights for brain research.Comment: 12 pages, 7 figures, 2 tables, LaTeX; corrected typo; added DO
Communication Analysis through Visual Analytics: Current Practices, Challenges, and New Frontiers
The automated analysis of digital human communication data often focuses on
specific aspects such as content or network structure in isolation. This can
provide limited perspectives while making cross-methodological analyses,
occurring in domains like investigative journalism, difficult. Communication
research in psychology and the digital humanities instead stresses the
importance of a holistic approach to overcome these limiting factors. In this
work, we conduct an extensive survey on the properties of over forty
semi-automated communication analysis systems and investigate how they cover
concepts described in theoretical communication research. From these
investigations, we derive a design space and contribute a conceptual framework
based on communication research, technical considerations, and the surveyed
approaches. The framework describes the systems' properties, capabilities, and
composition through a wide range of criteria organized in the dimensions (1)
Data, (2) Processing and Models, (3) Visual Interface, and (4) Knowledge
Generation. These criteria enable a formalization of digital communication
analysis through visual analytics, which, we argue, is uniquely suited for this
task by tackling automation complexity while leveraging domain knowledge. With
our framework, we identify shortcomings and research challenges, such as group
communication dynamics, trust and privacy considerations, and holistic
approaches. Simultaneously, our framework supports the evaluation of systems
and promotes the mutual exchange between researchers through a structured
common language, laying the foundations for future research on communication
analysis.Comment: 11 pages, 2 tables, 1 figur
VulnEx : Exploring Open-Source Software Vulnerabilities in Large Development Organizations to Understand Risk Exposure
The prevalent usage of open-source software (OSS) has led to an increased interest in resolving potential third-party security risks by fixing common vulnerabilities and exposures (CVEs). However, even with automated code analysis tools in place, security analysts often lack the means to obtain an overview of vulnerable OSS reuse in large software organizations. In this design study, we propose VulnEx (Vulnerability Explorer), a tool to audit entire software development organizations. We introduce three complementary table-based representations to identify and assess vulnerability exposures due to OSS, which we designed in collaboration with security analysts. The presented tool allows examining problematic projects and applications (repositories), third-party libraries, and vulnerabilities across a software organization. We show the applicability of our tool through a use case and preliminary expert feedback.publishe
ParSetgnostics : Quality Metrics for Parallel Sets
While there are many visualization techniques for exploring numeric data, only a few work with categorical data. One prominent example is Parallel Sets, showing data frequencies instead of data points - analogous to parallel coordinates for numerical data. As nominal data does not have an intrinsic order, the design of Parallel Sets is sensitive to visual clutter due to overlaps, crossings, and subdivision of ribbons hindering readability and pattern detection. In this paper, we propose a set of quality metrics, called ParSetgnostics (Parallel Sets diagnostics), which aim to improve Parallel Sets by reducing clutter. These quality metrics quantify important properties of Parallel Sets such as overlap, orthogonality, ribbon width variance, and mutual information to optimize the category and dimension ordering. By conducting a systematic correlation analysis between the individual metrics, we ensure their distinctiveness. Further, we evaluate the clutter reduction effect of ParSetgnostics by reconstructing six datasets from previous publications using Parallel Sets measuring and comparing their respective properties. Our results show that ParSetgostics facilitates multi-dimensional analysis of categorical data by automatically providing optimized Parallel Set designs with a clutter reduction of up to 81% compared to the originally proposed Parallel Sets visualizations.publishe